MIME-Version: 1.0
Server: CERN/3.0
Date: Wednesday, 20-Nov-96 19:46:43 GMT
Content-Type: text/html
Content-Length: 3490
Last-Modified: Wednesday, 28-Feb-96 20:30:10 GMT

<html>
<head>
<title> Kristen Summers </title>
</head>


<body>
<h1> Kristen Summers </h1>
<h2> PhD Student, Cornell University <br>
 summers@cs.cornell.edu <br>
5132 Upson Hall <br>
607-255-5577</h2>

<h2> Research Interests  </h2>

<p>I work with the 
<!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><a href="http://www.cs.cornell.edu/Info/Projects/icap.html">Information Capture and Access</a>
research group on document analysis.  My
long-term goal is to provide support for
sophisticated electronic document manipulation
tools for indexing, browsing, linking, etc.</p>

<p>My primary interest is in discovering logical
structure in arbitrary electronic documents.
The goal is to take an electronic document
representation as input and return a hierarchy
of logical pieces of the document as output.
For example, given a scanned-in or postscript
version of a technical report, I would like to
be able to divide it into sections, paragraphs, etc.
Similarly, in a business letter, the address headings,
body, and closing should be identifiable.</p>

<p>This problem has two primary components:  
<!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><a href = "http://cs-tr.cs.cornell.edu:80/TR/CORNELLCS:TR94-1452?abstract">segmentation</a>
(dividing the document into logical pieces) and
<!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><a href = "http://www.cs.cornell.edu/Info/People/summers/classify.html">classification</a>
(categorizing the pieces).
It also raises the questions of evaluation
(previous work differs in descriptions of the correct hierarchy), 
<!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><a href = "http://www.cs.cornell.edu/Info/People/summers/structures.html">types</a> of logical structures,
and theoretical limitations.</p>

<p>The task is relevant to two of Bruce Croft's
<!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><a href = "http://www.dlib.org/dlib/november95/11croft.html">top 10
research issues for information retrieval</a>
(in the 
<!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><a href = "http://www.dlib.org/dlib/november95/11contents.html">November
1995 issue</a> of <!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><a href = "http://www.dlib.org/">D-Lib Magazine</a>):
number 5, "interfaces and browsing," and number 3,
"efficient, flexible, indexing and retrieval."  Determining
logical structure enables flexible, hierarchical browsing; doing so
in a general way supports system flexibility and handling of
multiple document types.</p>

<h2> Papers </h2>

<ul>
<li><p><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><a href="http://www.cs.cornell.edu/Info/People/summers/segment.html">Using Non-Textual Cues for Electronic 
Document Browsing</a><br>
Co-authored with Daniela Rus.<br>
In <em>Digital Libraries:  Current Issues</em>, 
Nabil R. Adam, Bharat K. Bhargava, and Yelena Yesha, editors.
Chapter 9, pp. 129 - 162.  Lecture Notes in Computer Science series.
Springer-Verlag, 1995.</p>
<p>Versions in:
	<ul>
	<li>"Geometric Algorithms and Experiments for Automated Document Structuring,"
	<em>Mathematical and Computer Modelling</em>, forthcoming.
	<li>"<!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><a href="http://cs-tr.cs.cornell.edu:80/TR/CORNELLCS:TR94-1452?abstract">Using 
	White Space for Automated Document Structuring</a>,"
	Cornell University Computer Science Technical Report TR 94-1452.
	<li>Proceedings of the Workshop on the Principles of 
	Document Processing, Seeheim, 1994.  (PODP '94)
	</ul>
</p>

<li><p><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><a href="http://www.cs.cornell.edu/Info/People/summers/structures.html">Toward a Taxonomy of Logical Document Structures</a>
<br>
Electronic Publishing and the Information Superhighway:
Proceedings of the Dartmouth Institute for Advanced Graduate Studies,
pp. 124 - 133, Boston, May 1995.
<br>
Donald B. Johnson Memorial DAGS Scholar
award for the best student paper, co-recipient.
</p>

<li><p><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><a href="http://www.cs.cornell.edu/Info/People/summers/classify.html">Near-Wordless Document Structure 
Classification</a> <br>
Proceedings of the International Conference on Document Analysis
and Recognition, pp. 426 - 456, Montr&eacute;al, August 1995.
</p>
</ul>
</body>
</html>
